Waiting_for_a_psychologist_in_Midtjylland
Purpose of the project
If you need a psychologist inMidtjylland, the waiting times can be significantly long, which has been critiqued in the past years (source). The many weeks of waiting can cause the condition of the waiting patients to worsen, and this risk is not equally distributed through the municipalities, which makes the waiting time not only a problem for the individual patient, but also a problem of inequality. The purpose of this project is to investigate the waiting times to see, which municipalities has the longest waiting times, and how many people live in these “vulnerable” areas and can therefore risk to be affected by the long waits.
About the Data
I use two data sets: 1) The first dataset is called
gadm36_DNK_2_sp_rdsand is a shape file from GADM. It is a
spatial data frame containing information about the municipalities in
Denmark including the municipality polygons. 2) The second dataset is
from Region Midt, and contains the waiting times in Region Midtjylland
measured in weeks. The data is from a PDF found on xxxx, and I have
manually typed the data into a excel file (saved as .csv), which can be
found in the datafolder on Github. My pre-processing of the
data is described in the project report.
3) The third dataset is…
Libraries
# packages neede for the script to run
library(sf)
library(raster)
library(dplyr)
library(readr)
library(mapview)
library(RColorBrewer)
library(cartogram)
library(tmap)Load data
Loading the municipalities shape file from GADM. I project the data
to the EPSG number 25832 with the st_as_sf:
#load data
municipalities <- getData("GADM", country = "DNK", level = 2)
class(municipalities) # data is a SpatialPolygons DataFrame[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
#project the data
municipalities_25832 <- municipalities %>%
st_as_sf(municipalities_25832) %>%
st_transform(municipalities_25832, crs = 25832)
# inspecting the data
municipalities_25832Simple feature collection with 99 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1 GID_2 NAME_2
36828 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.1_1 Albertslund
36664 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.2_1 Allerød
36778 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.3_1 Ballerup
37010 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.4_1 Bornholm
36900 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.5_1 Brøndby
VARNAME_2 NL_NAME_2 TYPE_2 ENGTYPE_2 CC_2 HASC_2
36828 <NA> <NA> Kommune Municipality <NA> DK.HS.AB
36664 <NA> <NA> Kommune Municipality <NA> DK.HS.AL
36778 <NA> <NA> Kommune Municipality <NA> DK.HS.BA
37010 <NA> <NA> Kommune Municipality <NA> DK.HS.BO
36900 <NA> <NA> Kommune Municipality <NA> DK.HS.BR
geometry
36828 MULTIPOLYGON (((712057 6173...
36664 MULTIPOLYGON (((700891 6191...
36778 MULTIPOLYGON (((715156 6178...
37010 MULTIPOLYGON (((878103.7 61...
36900 MULTIPOLYGON (((716929 6168...
[ reached 'max' / getOption("max.print") -- omitted 5 rows ]
plot(municipalities_25832[7]) #plotting the municipalities geometryclass(municipalities_25832) # sf and data frame[1] "sf" "data.frame"
# checking if the polygons are valid
sf::st_is_valid(municipalities_25832) # they are all valid [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[ reached getOption("max.print") -- omitted 24 entries ]
Filtering the municipalities and only keeping the once in
Midtjylland. The regions are found in the
NAME_1column. Also changing the column name
NAME_2 to the more informative name municipalities:
#checking NAME_1 column containing the regions
municipalities_25832$NAME_1 [1] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[6] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[11] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[16] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[21] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[26] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[31] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland"
[36] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland"
[41] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland"
[46] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Nordjylland"
[51] "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland"
[56] "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland"
[61] "Sjælland" "Sjælland" "Sjælland" "Sjælland" "Sjælland"
[66] "Sjælland" "Sjælland" "Sjælland" "Sjælland" "Sjælland"
[71] "Sjælland" "Sjælland" "Sjælland" "Sjælland" "Sjælland"
[ reached getOption("max.print") -- omitted 24 entries ]
#creating new data frame containing only data for Midtjylland
Midtjylland <- municipalities_25832[municipalities_25832$NAME_1 == 'Midtjylland',]
#change "NAME_2" to "municipality"
Midtjylland <- Midtjylland %>%
rename( "municipality" = "NAME_2")
sort(unique(Midtjylland$municipality)) [1] "Århus" "Favrskov" "Hedensted"
[4] "Herning" "Holstebro" "Horsens"
[7] "Ikast-Brande" "Lemvig" "Norddjurs"
[10] "Odder" "Randers" "Ringkøbing-Skjern"
[13] "Samsø" "Silkeborg" "Skanderborg"
[16] "Skive" "Struer" "Syddjurs"
[19] "Viborg"
# replace "Århus" with "Aarhus" to match the other datasets
Midtjylland$municipality[Midtjylland$municipality == "Århus"] <- "Aarhus"Waiting times in Region Midt
I will combine the municipalities data with my data of the waiting time in the different municipalities in Midtjylland.
Loading the waitingtime_regionmidt.csvdata set and
inspecting it. Changing the name “Aarhus” to “Århus to match the
spelling of Århus in the Midtjyllanddata:
# load the waiting data
waitingtime <- read.csv2("../data/waitingtime_regionmidt.csv")
# Check column names
head(waitingtime) Kommune Gennemsnit.pr..1..Now.2021 Maksimum.pr..1..Nov.2021
1 Favrskov 14 26
2 Hedensted 4 4
3 Herning 17 27
4 Holstebro 22 30
5 Horsens 17 40
6 Ikast-Brande 17 22
Minimum.pr..1..Nov.2021
1 3
2 4
3 6
4 8
5 8
6 6
# Translate names from Danish to English
waitingtime <- waitingtime %>%
rename( "municipality" = "Kommune") %>%
rename("average_wait" = "Gennemsnit.pr..1..Now.2021") %>%
rename("max_wait" = "Maksimum.pr..1..Nov.2021") %>%
rename("min_wait" = "Minimum.pr..1..Nov.2021")
# Check names
sort(unique(waitingtime$municipality)) [1] "Aarhus" "Favrskov" "Hedensted"
[4] "Herning" "Holstebro" "Horsens"
[7] "Ikast-Brande" "Lemvig" "Norddjurs"
[10] "Odder" "Randers" "Region"
[13] "Ringk\xbfbing-Skjern" "Sams\xbf" "Silkeborg"
[16] "Skanderborg" "Skive" "Struer"
[19] "Syddjurs" "Viborg"
# fixing misspellings due to danish special characters
waitingtime$municipality[waitingtime$municipality == "Ringk\xbfbing-Skjern"] <- "Ringkøbing-Skjern"
waitingtime$municipality[waitingtime$municipality == "Sams\xbf"] <- "Samsø"
class(waitingtime) # it's a data frame[1] "data.frame"
I will also add the population in each municipality to the
waiting-midtdataset:
# load the waiting data
population <- read.csv2("../data/population_over_18.csv", header=FALSE)
# set column names
colnames(population) <- c("municipality", "age", "population")
# take only the "subtotal" rows (rows with total pop over 18 for each municipality)
total_population <- population[population$age == 'Subtotal',]
# remove column "age"
total_population_NoAge <- total_population[,-2]
#check names
sort(unique(total_population$municipality)) [1] "Aarhus" "Favrskov" "Hedensted"
[4] "Herning" "Holstebro" "Horsens"
[7] "Ikast-Brande" "Lemvig" "Norddjurs"
[10] "Odder" "Randers" "Region Midtjylland"
[13] "Ringk\xf8bing-Skjern" "Sams\xf8" "Silkeborg"
[16] "Skanderborg" "Skive" "Struer"
[19] "Subtotal" "Syddjurs" "Viborg"
# fixing misspellings due to danish special characters
total_population$municipality[total_population$municipality == "Ringk\xf8bing-Skjern"] <- "Ringkøbing-Skjern"
total_population$municipality[total_population$municipality == "Sams\xf8"] <- "Samsø"Load practices data set
practices <- read.csv2("../data/psycology_practices_2017.csv")
# Check names
sort(unique(practices$municipality)) [1] "Aarhus" "Favrskov" "Hedensted"
[4] "Herning" "Holstebro" "Horsens"
[7] "Ikast-Brande" "Lemvig" "Norddjurs"
[10] "Odder" "Randers" "Ringk\xbfbing-Skjern"
[13] "Sams\xbf" "Silkeborg" "Skanderborg"
[16] "Skive" "Struer" "Syddjurs"
[19] "Viborg"
#fixing misspellings due to danish special characters
practices$municipality[practices$municipality == "Ringk\xbfbing-Skjern"] <- "Ringkøbing-Skjern"
practices$municipality[practices$municipality == "Sams\xbf"] <- "Samsø"Merge data
Now I combine the four data sets, so that the municipalities in the four data sets are matched with each other:
# merge the two data frames
waiting_midt <- Midtjylland %>%
left_join(waitingtime) %>%
left_join(total_population) %>%
left_join(practices)
#view data
waiting_midtSimple feature collection with 19 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 443720.8 ymin: 6169518 xmax: 662597 ymax: 6300334
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1 GID_2 municipality VARNAME_2
1 DNK Denmark DNK.2_1 Midtjylland <NA> DNK.2.1_1 Aarhus <NA>
2 DNK Denmark DNK.2_1 Midtjylland <NA> DNK.2.2_1 Favrskov <NA>
3 DNK Denmark DNK.2_1 Midtjylland <NA> DNK.2.3_1 Hedensted <NA>
NL_NAME_2 TYPE_2 ENGTYPE_2 CC_2 HASC_2 average_wait max_wait min_wait
1 <NA> Kommune Municipality <NA> DK.MJ.AR 21 65 5
2 <NA> Kommune Municipality <NA> DK.MJ.FA 14 26 3
3 <NA> Kommune Municipality <NA> DK.MJ.HD 4 4 4
age population practices geometry
1 Subtotal 291394 60 MULTIPOLYGON (((575822.2 62...
2 Subtotal 37058 6 MULTIPOLYGON (((561470 6229...
3 Subtotal 37008 3 MULTIPOLYGON (((542217.9 61...
[ reached 'max' / getOption("max.print") -- omitted 7 rows ]
Save the new merged data set in the data_output
folder:
setwd('..') #changing position
readr::write_csv(waiting_midt,"data_output/waiting_midt.csv") Which municipalities has the longest waiting time?
Now that the data is ready, I will investigate the waiting time in each of the 19 municipalities in Midtjylland. I found, that the best way to do this is to visualize the waiting time with plot()
# plot waiting times using mapview
waiting_mapview <- mapview(waiting_midt[c(7, 14, 15, 16)], zcol = "average_wait", col.regions=rev(brewer.pal(10, "RdBu")))
# show map
waiting_mapviewSave mapview map as HTML:
mapshot(waiting_mapview, "../map_output/waiting_mapview.html")Visualize with Tmap
How many people over 18 lives in the different municipalities? Is there a correlation between long waiting time and a high number of young people in this age group living in the municipality? How many people live in the “vulnerable” areas?:
# Visualize the population (over 18 years) in each municipality
population_map <- tm_shape(waiting_midt["population"]) +
tm_polygons("population", palette = "YlGn", title = "Population (over age 18)", style = "jenks") +
tm_layout(frame = FALSE, legend.position = c("right", "top"))
average_waiting_map <- tm_shape(waiting_midt["average_wait"]) +
tm_polygons("average_wait", palette = "Reds", title = "Waiting time (weeks)", style = "jenks") +
tm_layout(frame = FALSE, legend.position = c("right", "top"))
# Show the two maps
population_mapaverage_waiting_mapSave the maps side by side:
# Save the two maps in the same pdf
pdf("../map_output/waiting_and_population_maps")
tmap_arrange(population_map, average_waiting_map)
dev.off()quartz_off_screen
2
Plot relative number of clinics
Calculating the number of psycologist practices relative to the young population in each municipality:
# calculate number of practices pr. 100 young persons in each municipality
waiting_midt$relative_practices = waiting_midt$practices/(waiting_midt$population/1000)Now it’s time to plot the relative number of practices so they can be compared with the waiting times:
# create centroid for each municipality
centroids <- st_centroid(waiting_midt)
# use tmap to create a bubble (connected to each centroids) with a size corresponding to the number of practices
practices_map <- tm_shape(waiting_midt) +
tm_polygons("average_wait", palette = "Reds", title = "Waiting time in weeks", style = "jenks") +
tm_layout(frame = FALSE, legend.position = c("right", "top")) +
tm_shape(centroids) +
tm_bubbles(size = "relative_practices", scale = 2.5, style = "jenks", col = "lightblue", alpha = 1, border.col = "blue") +
tm_layout(frame = FALSE, legend.position = c("right", "top"), title = "Waiting time and number of practices pr 1000 persons", legend.title.size = 1.0, legend.title.fontface = "bold")
# show map
practices_map# Save bubble-map
tmap_save(practices_map, "../map_output/relative_practizes_map.png")Sources:
Merging data: https://www.statmethods.net/management/merging.html
Rename column in data frame: https://sparkbyexamples.com/r-programming/rename-column-in-r/
Combine data frame with spatial data frame: https://stackoverflow.com/questions/56116443/how-do-i-combine-a-dataframe-with-a-spatial-dataframe-when-receiving-errors-with
Create cartogram: https://cran.r-project.org/web/packages/cartogram/readme/README.html
Color palette to mapview map: https://stackoverflow.com/questions/60099307/add-color-palette-to-mapview-map